Identification and Candidate Gene Evaluation of a Large Fast Neutron-Induced Deletion Associated with a High-Oil Phenotype in Soybean Seeds

Since the dawn of agriculture, crops have been genetically altered for desirable characteristics. This has included the selection of natural and induced mutants. Increasing the production of plant oils such as soybean (Glycine max) oil as a renewable resource for food and fuel is valuable. Successful breeding for higher oil levels in soybeans, however, usually results in reduced seed protein. A soybean fast neutron population was screened for oil content, and three high oil mutants with minimal reductions in protein levels were found. Three backcross F2 populations derived from these mutants exhibited segregation for seed oil content. DNA was pooled from the high-oil and normal-oil plants within each population and assessed by comparative genomic hybridization. A deletion encompassing 20 gene models on chromosome 14 was found to co-segregate with the high-oil trait in two of the three populations. Eighteen genes in the deleted region have known functions that appear unrelated to oil biosynthesis and accumulation pathways, while one of the unknown genes (Glyma.14G101900) may contribute to the regulation of lipid droplet formation. This high-oil trait can facilitate the breeding of high-oil soybeans without protein reduction, resulting in higher meal protein levels.


Introduction
There is a strong genetic component that is well understood for oil content and quality via the oil biosynthetic pathway and its regulation.Seed oil is biosynthesized during the second main stage of seed maturation [1-3], at which time the relevant biosynthetic enzymes are highly expressed.For instance, in studies of expression profiles of triacylglyceride (TAG) biosynthetic enzymes and oil accumulation in developing soybeans (Glycine max), DGAT1 shows an expression profile suggesting a dominant role in soybean oil biosynthesis, but DGAT2 and PDAT do not [4][5][6].
It is becoming increasingly possible to alter hydrocarbon flux in soybeans.Multiple studies indicate that oil content increases with a higher expression of TAG biosynthetic genes [7][8][9].Increased expression of regulatory genes that upregulate multiple enzymes for fatty acid biosynthesis also can result in higher oil levels [9].Co-expression of the transcription factor WRI1 with DGAT1, a key rate-limiting enzyme, is shown to have a synergistic effect on TAG biosynthesis in plants [10][11][12].Overall, increasing sink strength results in increased oil and protein, with a strong pronounced effect on protein and less on oils [13].
Cultivar effects are well understood to affect protein content and amino acids in soybeans, most likely due to heritable differences in TAG biosynthetic genes and regulatory factors.For instance, germplasm line N6202 produced seeds with 45.7% protein content and a 10% reduction in grain yield compared to a control variety, NC-Roy [14].In contrast, TN03-350 and TN04-5321 achieved 43.1-43.9%protein content without sacrificing seed yield.Considering the importance of grain yields in commercial varieties, such a result is more desirable than a decrease in yield with improved grain quality.In soybean genotypes of early maturity groups, average-to-high protein content (399-476 g/kg −1 ) was found in years with high air temperature and moderate rates of rainfall during the seed-filling period, whereas seed protein content was drastically reduced (265-347 g/kg −1 ) in seasons of insufficient nitrogen fixation or higher amounts of precipitation during seed filling [15].
In plant breeding, random mutagenesis is a common way to generate mutations and increase genetic diversity for traits with limited natural variation.Common examples include the use of a chemical mutagen like ethyl methane sulfonate (EMS) or bombardment with gamma radiation, and today even site-directed mutagenesis is possible [16].While these methods can produce useful traits, such as an early flowering mutant or increased oil content, it is possible that other important genes could be mutated, causing undesirable phenotypes, such as altered seed composition.However, mutagenized populations are not brought under the same scrutiny as transgenic approaches, and therefore traits induced this way are much easier to incorporate into the existing germplasm.
While utilization of mutagens can introduce much-needed levels of genetic and phenotypic diversity, it is imperative to understand the nature of the mutations induced.The mutagen used in this study, fast neutron bombardment (FN), typically induces deletion and/or chromosomal rearrangement events from several base pairs-long to several megabases [17][18][19].The phenotypic variation can be used to study the association of genes with specific traits [20] or as a source of new variation for breeding purposes [21][22][23].Comparative genomic hybridization (CGH) is one of the fastest and most effective ways to assess duplications or deletions caused by irradiation mutagenesis, such as FN.This technology utilizes oligonucleotide probes affixed to slides, known as microarrays.Fluorescently labeled DNA samples can then be hybridized to the probes, thus emitting a fluorescence intensity proportional to the DNA sample copy number for each probe sequence.The mutant DNA sample can be compared to a control sample (the non-mutagenized parent DNA) to identify tracts of sequence that have been duplicated or deleted in the mutant compared to its parent line [17,18,24].In the case of the current study, the soybean microarray consists of 700,000+ features, allowing for a probe spacing of approximately 1000 base pairs across the euchromatic portion of the genome.With the advances of next generation sequencing and microarray technology, combined with intimate knowledge of the soybean genome, we can now harness CGH technology to quickly and precisely assess copy number variants (CNVs) in segregating mutant backcrosses.
Here, we utilized a fast neutron population of soybeans which exhibit 3-to-4% higher oil content than the parent variety with only a minor decrease in protein, resulting in seeds with increased oil plus protein content compared to the parent varieties.We hypothesize that the loss of a specific gene or genes within the deleted 300 kb region on chromosome 14 is responsible for the high-oil phenotype.Our analysis provides new insight into which of these genes may be the most likely to cause this change and may be the best candidate for future functional analyses.

Genetic Material
Seeds from three mutant lines, 1R22C28Cgadbr355aMN13, 5R12C21Dar387dMN13, and 5R16C01Dar388eMN13, and their parent, M92-220, from the University of Minnesota, were requested from their fast neutron (FN) mutant library in April 2014 and planted in the field in Lexington, KY, during the summer of 2014 to verify high-oil content in that environment using a single-line, randomized complete block design with border seed of the "Jack" cultivar.Ten plants were in each line, with three total replications, with each plant individually bulked and analyzed via non-destructive NIRS [25][26][27].
M92-220 is a maturity group "I" soybean produced by the University of Minnesota soybean breeding program.It is derived from the cultivar "MN1302" (PI 616498), which was originally selected from a cross between "Hendricks" × "Archer" [28].M92-220 exhibits an indeterminate growth habit, purple flowers, grey pubescence, brown pods, yellow seeds, and a buff hilum.

Backcrossing
Two mutant lines (1R22C28Cgadbr355aMN13 and 5R12C21Dar387dMN13) in the M8 and M5 generations, respectively, exhibited high oil content in the KY environment and were thus planted and backcrossed to the parent line M92-220 in 2015, similar to other methodologies [18].Successful crosses were harvested, and eleven F1 crosses were assessed for oil content via single-seed NMR.These were then grown in a greenhouse in the winter of 2015-2016 and in the spring of 2016.Thirty F2 seeds from each of the F1 plants were assessed for single-seed oil content and then planted at Spindletop Farm in Lexington, KY.In addition, 30 other randomly selected seeds were planted for each line, bringing the total to 60 total F2 seeds from F1 plants planted for each line.

Tissue Sampling, Seed Composition, and DNA Extraction
Leaf tissue was collected and frozen at −80 • C [29] for comparative genomic hybridization (CGH) analysis from F2 plants.Seeds were harvested from each plant in the fall, and each F2 plant's F3 seed was analyzed in bulk with a Perten DA7200 NIRS for oil, protein, and moisture content.These data were used to generate scatter plots of the F2 sibs.Strong inverse correlations of oil and protein content suggested that segregations of the mutant trait were found in three F2 populations.

NMR Methods
Oil content was determined by single-seed NMR, using a Minspec 20 (Bruker Biospin, The Woodlands, TX, USA) [25].The instrument accommodates a 20 mm sampling-tube diameter.Seeds were weighed and placed into the tube and allowed to warm to 40 • C before insertion into the instrument.The standard oil seed measurement procedure supplied with the instrument controller was used.Calibration of the NMR instrument was performed using weighed amounts of extracted soybean oil encompassing the range of oil weights of the seed samples.Four different weights were used and were expressed on tissue paper at the bottom of the 20 mm sample tube.

CGH Analysis
DNA was extracted on a per-plant basis using a QIAGEN Plant DNeasy Kit from the leaves of three F2 populations (known here as A1, A2, and A3) expected to segregate for seed-oil content.The F2 plants that gave rise to F2:3 seeds with the highest seed oil ("highoil" plants) and lowest seed oil ("normal-oil" plants) were respectively identified using NIRS.The DNA from the high-oil F2 plants was bulked, and the DNA for the normal-oil F2 plants was bulked for each of the three populations.The bulked DNAs were then subjected to CGH analyses, as previously described [17], using a custom NimbleGen CGH microarray with over approximately 700,000 probes, approximately 1 probe every 1 kb [17].The CGH probe positions were designed according to the soybean cultivar Williams 82 genome version 2 assembly (Wm82.a2.v1).Each CGH was performed as a comparison within each population (e.g., A1 high-oil versus A1 normal-oil, etc.) From these comparisons, a large deletion was detected on chromosome 14, and the genes in that region (according to the Williams 82 version Wm82.a2.v1 gene annotation set) were analyzed on soybase.org and cross-referenced to homologous genes in the TAIR database.

Functional Analysis
SignalP-6.0 (https://services.healthtech.dtu.dk/services/SignalP-6.0/, accessed on 1 January 2024) was used to investigate possible signal peptides.The topology analysis was performed by Protter (https://wlab.ethz.ch/protter/start/,accessed on 1 January 2024) [30], and the protein network was established using the STRING database (https://string-db.org,accessed on 1 January 2024) [31].Interactions in STRING are derived from genomic-context predictions, high-throughput lab experiments, co-expression, automated text-mining, and previous knowledge in databases.Phyre2, also known as Protein Fold Recognition Server, is a web portal for protein modeling, prediction, and analysis (http://www.sbg.bio.ic.ac.uk, accessed on 1 January 2024) [32].Phyre2 was used to study the function of unknown genes in this study, and Chimera software version 1.11.2 was used to visualize the structural protein models [33].

Oil and Protein Content of Backcrosses
Three mutant lines were selected for analysis based on previously observed high seedoil content when grown in MN.The full names of these lines (1R22C28Cgadbr355aMN13, 5R12C21Dar387dMN13, and 5R16C01Dar388eMN13) are abbreviated herein as "1R22", "5R12", and "5R16" for simplicity.When grown in KY in 2015, these lines again showed a higher mean seed oil content, though not as pronounced compared to their wild-type parent line, M92-220, as was previously observed in MN (Table 1).We attempted to backcross the mutants to M92-220, resulting in eleven successful crosses.However, successful crosses were only observed for two of the mutants, 1R22 and 5R12.Successful crosses were obtained between 1R22 × M92-220 and 5R12 × M92-220.The oil content of the resulting F1 seeds ranged from 17.3 to 22.5% (Table 2), as was determined by single-seed NMR.These F1 seeds were grown in a greenhouse over winter and designated as A1 through B3 plants.A5, A6, A7, and B3 seeds were planted but were not viable or did not survive to maturity.The F2 seeds resulting from the greenhouse planting were harvested and bulked to measure oil and protein content for all the seeds from each F2 population via NIRS (Table 3).Furthermore, thirty seeds were randomly selected from each of these populations for single-seed NMR analysis.Table 4 shows data from the A1 population.These 30 F2 seeds analyzed via NMR were then planted in the field for each respective population, along with an additional 30 F2 seeds that did not contain single-seed data per population.Leaf tissue was collected during the two-leaf stage for the F2 plants, and the F3 seeds were harvested at maturity.NIRS bulk data were collected on each F2:3 sample, using approximately 100 seeds.The protein and oil estimates for these families are shown for lines derived from the A1, A2, and A3 F1 lineages, which are all from crosses between 1R22 and the M92-220 WT (Table 5).Scatter plots of the protein and oil estimates for a deeper sampling of F2:3 families are shown (Figure 1); each data point represents the values for a given F2:3 sample of seeds.The populations generally showed good spread in their data, with inverse correlations between protein and oil.This indicates that a genetic factor may have segregated in the F2 plants, perhaps an FN-induced mutation that is associated with the high-oil trait.
Table 2. Oil content of individual F1 seeds.Each F1 seed was assigned a population name for reference to the future generations raised from that seed.The first parent mentioned is the maternal plant, and the second mentioned is the paternal.Oil content was determined by single-seed NMR using a Minspec 20 (Bruker Biospin, The Woodlands, TX, USA).Seeds were weighed and placed into the tube and allowed to warm to 40 • C before insertion into the instrument.The standard oil seed measurement procedure supplied with the instrument controller was used.n = 3 technical replications on each seed, and SEs are in parentheses.Herein, the lines are referred to by the population name rather than parents for clarity.4. Single-seed oil and protein of 30 randomly selected F2 seeds (from F1 plants) selected for field planting, all of which are sibs.The field # (number) serves as an identifier related to seeds retained in long-term storage and used to track lineages.Oil and protein were determined for each seed using single-seed NIRS with 3 technical reps per seed.Single seeds were also surveyed in a similar way for lines A2, A3, A4, A8, B1, and B2, whose genetic lineages are described in Table 3.  5. Seed oil and protein from F2:3 lines A1, A2, and A3 (all originally derived from 1R22×WT crosses).Each row represents data from a sample of F3 seeds derived from an individual F2 plant.Oil and protein were determined via NIRS using a Perten DA7200 spectrometer and calculated on a 0% moisture basis.Each population (A1, A2, and A3) was split into a high-oil (bold text) and normal-oil (non-bold text) group.These groupings were used to determine which F2 DNA samples were bulked together for CGH analyses (high-oil bulk versus normal-oil bulk).

CGH Analysis Reveals a Strong Candidate Deletion for the High-Oil Mutant Phenotype
We chose to further investigate the A1, A2, and A3 populations to determine if an FNinduced deletion was co-segregating with the high-seed-oil phenotype in the 1R22 mutant populations.We collected DNA for each of the F2 plants grown in the field in KY.We performed a bulk segregant analysis using CGH to see if there were any deletions that cosegregated with the high-oil phenotype.Based on the F2:3 NIRS data (Table 5), we binned each F2 into those that gave rise to high seed oil (high-oil, bold) and those that gave rise to normal seed oil levels (normal-oil, non-bold) within each respective population.We then bulked the DNA for the high-oil and normal-oil plants, respectively.These bulked DNA samples were subjected to CGH analyses to compare the deletion/duplication profile of the high-oil and normal-oil bulks for each of the three populations.The deletion/duplication profile of the high-oil and normal-oil bulks for each of the three populations was determined.We expected that the CGH profile would identify a differential hybridization between any deletion that was enriched in the high-oil segregating bulk of individuals compared to the normal-oil segregating bulk.
One large deletion event (>300 kb) on chromosome 14 was enriched in the high-oil bulks of the A1 and A3 populations (Figure 2), as evidenced by the string of data points below the log2 value of zero.(Zero indicates no difference between the high-oil bulk and the normal bulk, whereas values below zero indicate an enrichment for an FN deletion in the high-oil bulks compared to the normal bulks.)Specifically, the deletion was found on Chr14, from bp 9,994,086 to 10,301,954, a span of approximately 308 kb.The deletion is more pronounced in the A1 population, indicating that a greater proportion of individuals in that population were likely homozygous for the large deletion.Nonetheless, the A3 population also shows an enrichment for the large deletion in the high-oil bulk compared to the normal bulk.We know from previous experience that it is difficult to determine perfect bulks based on phenotypes for seed composition traits [18].Thus, it is not surprising that our three populations did not all show similar enrichment.In fact, the A2 population did not show enrichment for this deletion in its high-oil bulk.It is worth noting, however, that the spread of the oil data in the A2 population was less distinct than the spread from the A1 and A3 populations (Figure 1), indicating that the A2 plants were more likely to have their DNA samples bulked into the wrong group.Thus, we tentatively conclude that the large deletion on chromosome 14 is likely co-segregating with the high-seed-oil phenotype in these mutant populations.

Functional Analysis
The deletion on chromosome 14 encompasses 20 soybean gene models.All annotated functions of the 20 putative genes in the deleted region are summarized (Table 6).Of the 20 genes, 18 have a readily predicted function, but none appears to have an obvious connection to seed oil phenotypes.Tables 6-8 contain information about all genes within the deletion region.Among genes in this table, Glyma.14G102100 and Glyma.14G101900 had no readily predicted function.Therefore, bioinformatic analyses were performed to reveal possible functions for these genes.Glyma.14G101900 with 82 amino acids is composed of 61% disordered protein (lacks an ordered three-dimensional structure) (Table 7) and is predicted to be a transmembrane protein by TAIR.The membrane topology of this gene was illustrated and predicted by Protter (Figure 3).The protein network for Glyma.14G101900(Figure 4) and the PDB model of both Glyma.14G101900 and Glyma.14G102100 were predicted by Phyre2 and illustrated by Chimera (Figures 5 and 6).

AT5G10810
As one of 25 candidate AtPNP-As which showed weak interaction strength in the yeast two-hybrid (Y2H) analysis.
AtPNP-A = plant natriuretic peptides (PNPs), which comprise a novel class of hormones that systemically affect salt and water balance and responses to plant pathogens. [34]

AT4G31870
Glutathione peroxidases 7(GPX7) is one of the major ROS-scavenging enzymes which catalyze the reduction of H 2 O 2 in order to prevent potential H 2 O 2 -induced cellular damage.GPX7 (Cys-108, Gln-143, and Trp-197) residues are potential catalytic residues found to be strictly conserved. [36] GPX7 is linked to the establishment of the photooxidative stress tolerance and the basal resistance to P. syringae infection.
[37] GPX7 belongs to a family of thiol-based glutathione peroxidases that catalyzes the reduction of H 2 O 2 and hydroperoxides to H 2 O or alcohols using glutathione as an electron donor.Plant GPXs are implicated in redox signal transduction.[38] VaAQ (a putative GARP-type transcription factor of Amur grape (Vitis amurensis) overexpression increases antioxidant enzyme activities and upregulates ROS scavenging-related genes such as GPX7 under cold stress.[39] Strongly induced in carotenoid-accumulating Arabidopsis roots.
[40] Expressed highly in response to oxidation-reduction processes [41] Molecular analysis indicates that glutathione peroxidase 7 (GPX7) is specifically induced to compensate for the absence of APx-R (ascorbate peroxidase-related). (Peroxidases are enzymes that catalyze the reduction of hydrogen peroxide, thus minimizing cell injury and modulating signaling pathways in response to this reactive oxygen species.)[42] The transcript abundance of the GPX7 (At4g31870) was increased by a cryoprotectant treatment.

AT4G31860
A protein phosphatase DEGs involved in Cold Response.

AT4G31830
A gene that may be involved in the drought response and upregulated in leaf tissue.
[72] A gene that shows tissue specificity, as well as expression conservation in rice and Arabidopsis seeds.

AT5G10840
Endomembrane protein (70 protein family) that is downregulated by Colletotrichum acutatum in strawberry crown tissue.
[75] AT5G10840 encodes a highly altered redox-regulated protein in response to 3 mM bicarbonate treatment in A. thalina var.Landsberg erecta.
[76] A conserved syntenic region that pairs between the pseudo-ancestral Arabidopsis genome and Prunus genetic maps.Endomembrane protein 70, putative TM4 family. [77] The transmembrane proteins identified from the plasma membrane of poplar differentiating xylem and phloem.Endomembrane protein 70, putative.
[78] AT1G18400 BRASSINOSTEROID ENHANCED EXPRESSION1 (BEE1) (At1g18400) is a low-temperature regulator of flavonoid accumulation.BEE1 and GFR (G2-LIKE FLAVONOID REGULATOR) were both shown to negatively regulate anthocyanin accumulation by inhibiting anthocyanin synthesis genes via the suppression of the bHLH (TRANSPARENT TESTA8 (TT8) and GLABROUS3 (GL3)) and/or the MYB (PRODUCTION OF ANTHOCYANIN PIGMENTS2 (PAP2)) components of the MBW complex. [79] Arabidopsis BEE1 (AT1G18400) is the orthologue of bHLH056 in papaya.bHLH056 may be involved in the process of ABA stress but has different function compared to Arabidopsis.[80] A positive regulator of flavonoid accumulation at low temperatures.
[79] A brassinosteroid signaling component and a positive regulator of shade avoidance syndrome.
[82] A gene encoding the BR (brassinosteroid) signaling components EE3 (AT1G73830) and EE1 (AT1G18400) that are significantly upregulated by ethanol treatment, suggesting that the BR pathway is also involved in plant responses to ethanol. [83] BEE1, BR-related transcription factor that are upregulated by BR which encode putative AtMYC2 (bHLH) proteins in A. thaliana.One of the three redundant brassinosteroid early response genes that encode putative bHLH transcription factors required for normal growth.
[84] One of differential expression genes related to flowering in the Photoperiod pathway.
[85] BEE1 is a positive regulator of photoperiod flowering and promotes flowering by directly binding to the floral integrator FT.

AT1G21280
A SNP predicted to be associated with brown rot resistance in peach.
[89] Homologue to Tp57577_TGAC_v2_mRNA41271.v2 gene transcription involved in regrowth influenced by location and environmental conditions response after mowing of red clover (Trifolium pratense). [90] Is a duplicated region in Chr 10 of soybean associated with seed protein content.[91] Table 8.Cont.

Name of Gene Function Reported
Table 8.Cont.

Discussion
Forward screening of mutant populations for seed composition traits has been utilized with success in many crop species, including soybean seed composition changes induced by FN [18,21,22].Soybean FN mutant families are powerful due to large variations in mutation sizes, from several bp deletions up to Mb-sized deletion events [17][18][19]24].We utilized this resource and a forward screening approach to identify three mutants with high seed oil content and lower decreases in protein than are found in varieties produced using conventional breeding techniques [126][127][128].These mutants were then backcrossed to the parent variety, which, in theory, should produce heterozygous offspring for the

Discussion
Forward screening of mutant populations for seed composition traits has been utilized with success in many crop species, including soybean seed composition changes induced by FN [18,21,22].Soybean FN mutant families are powerful due to large variations in mutation sizes, from several bp deletions up to Mb-sized deletion events [17][18][19]24].We utilized this resource and a forward screening approach to identify three mutants with high seed oil content and lower decreases in protein than are found in varieties produced using conventional breeding techniques [126][127][128].These mutants were then backcrossed to the parent variety, which, in theory, should produce heterozygous offspring for the mutant traits in the F1 population.After a generation of self-pollination, we would expect to observe a genotypic segregation ratio of 1:2:1 for the homozygous mutant/heterozygous/homozygous wild type.
In the populations A1, A2, and A3, phenotype segregation was apparent in the F2:3 seeds, with oil content ranging from 19 to 24%, spread over a somewhat normal distribution.Assuming that the high-oil phenotype is caused by an FN-induced deletion, CGH may be able to show the deletion event when comparing the highest and lowest oil F2 bulked DNA samples, which was clearly observed in the A1 population.Furthermore, the size of the deletion detected, about 300 kb, is within the size range we anticipate and is frequently observed in CGH on soybean FN mutants [17,18,24].
While the size of the deletion event may imply that it is the source of observed phenotypic variation, it is also important to examine the deleted genes to begin to hypothesize about the mechanism which causes the observed variability.To date, many efforts have been made through mutagenesis, conventional breeding, and biotechnology to increase oil content of seeds.It has been well established that the ratio of sucrose/asn+gln from the mother plant significantly alters oil and protein content but results in an inverse correlation of these traits [129 -131].
There is a genetic component to this, as high oil is heritable when selected in this manner, but usually results in a corresponding and roughly equal loss of protein.Metabolic engineering efforts have been effective at elevating oil content without the corresponding loss in protein by utilizing push, pull, and protect mechanisms.First, the "push" mechanism directs hydrocarbon resources toward the oil biosynthetic pathway, creating an abundant source of metabolic precursors.The transcription factor WRI1 is known to operate in this fashion [12,132,133].Next, "pull" mechanisms occur later in the pathway and use downstream metabolites at a faster rate, thereby causing upstream resources to be redirected into the pathway [11].
The enzyme diacylglycerol acyltransferase (DGAT) catalyzes the final and only dedicated step to TAG synthesis via the Kennedy pathway by combining a diacylglycerol molecule with an acyl-CoA [134].Biochemical studies have also determined that this is a rate-limiting step in many species, so increasing the speed of TAG formation via DGAT increases the speed of the entire pathway [135].DGAT-overexpression studies confirm this phenomenon, and DGAT-overexpressed plants have significantly higher oil levels, with no decrease in protein [136,137].Lastly, "protect" mechanisms ensure that TAGs already formed do not degrade.For example, lipase knockouts [138] exhibit increased oil content, as do oleosin overexpressors, which are proteins that stabilize storage oil bodies [11].Better yet, plants which are engineered with two or more of these steps exhibit synergistic increases in oil, such as tobacco leaves with up to 30% on a dry-weight basis of storage lipids [12].
In 1R22, the main FN mutant from this study, 20 gene models are located within a deletion that co-segregates with the high-oil phenotype.Eighteen of these genes have known functions but do not have an obvious fit to the high-seed-oil mutant phenotype.One of the unknown genes, Glyma.14G102100, is predicted to be a transposon ty3-g gag-pol polyprotein with 99.4% confidence and classified as a DNA-binding protein by Phyre2 (Table 7).We have no reason to suspect that this gene is involved in the mutant high-oil trait.The other unknown gene, Glyma.14G101900, has a COOH terminus predicted to be inside the plasma membrane, while the H2N terminal is predicted to be extra-cellular.A signal peptide analysis also showed that Glyma.14G101900 has no signal peptide (Figure 3).Some articles state that disordered proteins mainly trigger cellular stress responses or affect protein interaction networks.Ma et al. [139] stated that the deletion of such a disordered region enhances oil accumulation in Arabidopsis.The hydrophobicity surface and other views of the Glyma.14G101900and Glyma.14G102100PDB model predicted by Phyre2 and is illustrated via Chimera (Figures 5 and 6).
Protein network analysis performed by the STRING database indicates that Glyma.14G101900 has an interaction with four main proteins (STRING identifiers: I1L6A3, I1MQK0, A0A0R0F173, and I1L3W9), as shown in Figure 4. I1L3W9 is an uncharacterized protein that belongs to the short-chain dehydrogenase/reductase (SDR) family.A0A0R0F173 is AB hydrolase-1 domain-containing protein.I1MQK0 is also an uncharacterized protein that belongs to the short-chain dehydrogenases/reductases (SDR) family.SDR enzymes have critical roles in lipid metabolism [140].
I1L6A3 is a Seipin 1A that has a role in lipid droplet formation and storage, and it is necessary for both adipogenesis and lipid droplet (LD) organization [141].At this stage, we do not know how a deletion of Glyma.14G101900 would change the interactions between these proteins to produce high oil.It is possible that Glyma.14G101900 has a negative interaction with these proteins, specifically Seipin, such that its deletion increases oil production.
We also speculate that one of the proteins in the deleted section may have a role in reducing triacylglycerol biosynthesis.Thus, many pathway analyses were performed, but no clear role of any of the gene products in triacylglycerol biosynthesis has been uncovered so far.The investigations of possible functions of these putative genes (especially from available RNA-seq data) are summarized in Table 8.
There seem to be several plausible hypotheses for how this deletion event may be influencing oil content in seeds; however, further studies are needed to confirm this.First, it would be useful to establish a genetic marker for this deletion to confirm that this location is the source of the high-oil phenotype.This could be a simple PCR amplicon across the deletion boundaries, which would provide a PCR product in plants that carry the deletion and no product in plants that do not carry the deletion.This would be analogous to a VgDGAT marker that was used to track a transgene conferring a high-oil phenotype [136,137].In other CGH mutant lines, this method was used to confirm a FAD2 gene deletion, and the high oleic acid content correlated directly with the presence or absence of a PCR marker [17].In addition, cloning of the genes in this region and inserting them into the mutant via transgenesis may be able to rescue the wild-type phenotype and precisely pinpoint the gene responsible for increased oil content.In the future, a full functional analysis could systematically reveal the candidate gene or genes within this deletion that underly the changes in phenotype.Ultimately, breeders and growers are not very concerned with the exact gene or mechanism which increases oil content; they know only that change is established as heritable, is easily identifiable with standard assays, and has limited other detrimental effects on the phenotype.Ultimately, it seems that the loss of a specific gene or genes within the deleted 300 kb region on chromosome 14 is responsible for the high-oil phenotype, and a further analysis could pinpoint the ultimate cause of these changes.Funding: Funding for DNA extraction and the oil and protein analysis was provided by the Kentucky Soybean Board, and CGH analysis was provided by the United Soybean Board, Project #1520-532-5603.

34 Figure 1 .
Figure 1.Scatter plots of the protein (x-axes) and oil (y-axes) content of F2:3 seeds.Each data point represents the NIRS values for each respective F2:3 family (e.g., corresponding to the rows in Table5for the A1, A2, and A3 populations).Seeds were analyzed for percent oil and protein content on a dry-weight basis via bulk-seed NIRS with n = 3 technical replications for each F2:3 seed lot.

Figure 1 .
Figure 1.Scatter plots of the protein (x-axes) and oil (y-axes) content of F2:3 seeds.Each data point represents the NIRS values for each respective F2:3 family (e.g., corresponding to the rows in Table5for the A1, A2, and A3 populations).Seeds were analyzed for percent oil and protein content on a dry-weight basis via bulk-seed NIRS with n = 3 technical replications for each F2:3 seed lot.

Figure 2 .
Figure 2. A large copy number variation (CNV) event detected in chromosome 14 of the A1 and A3 populations (both derived from separate c × WT) exhibited strong inverse correlations of oil and protein content.The x-axis indicates the location of each microarray feature along chrom to the Williams 82 genome version 2 assembly (Wm82.a2.v1).The y-axis shows the log2 ratio of the CGH intensity from the high-oil versus th each microarray feature.The blue dots show the CGH comparison between A1 high-oil versus A1 normal-oil.The orange dots show the CG normal-oil for A2, and the grey dots show the CGH of high-oil versus normal-oil for A3.These data indicate that a large deletion is enriched of the A1 and A3 populations, while the A2 population does not show this enrichment.This deletion event was detected on Chr14, from bp 9,9 a span of approximately 308 kb.

Figure 2 .
Figure 2. A large copy number variation (CNV) event detected in chromosome 14 of the A1 and A3 populations (both derived from separate crosses between 1R22 × WT) exhibited strong inverse correlations of oil and protein content.The x-axis indicates the location of each microarray feature along chromosome 14, according to the Williams 82 genome version 2 assembly (Wm82.a2.v1).The y-axis shows the log2 ratio of the CGH intensity from the high-oil versus the normal-oil bulk for each microarray feature.The blue dots show the CGH comparison between A1 high-oil versus A1 normal-oil.The orange dots show the CGH of high-oil versus normal-oil for A2, and the grey dots show the CGH of high-oil versus normal-oil for A3.These data indicate that a large deletion is enriched in the high-oil plants of the A1 and A3 populations, while the A2 population does not show this enrichment.This deletion event was detected on Chr14, from bp 9,994,086 to 10,301,954, a span of approximately 308 kb.

Figure 3 .
Figure 3. Predicted topology of Glyma.14G101900.It has one transmembrane region, designated by the blue 1. Extra means extra-cellular, and intra means inside the plasma membrane.

Figure 3 .
Figure 3. Predicted topology of Glyma.14G101900.It has one transmembrane region, designated by the blue 1. Extra means extra-cellular, and intra means inside the plasma membrane.Genes 2024, 15, x FOR PEER REVIEW 24 of 34

Figure 4 .
Figure 4. Protein network for Glyma.14G101900clustered in three groups.Glyma.14G101900 is referred to as I1M988 (red color) in the STRING database.Green lines represent all relationships to Glyma14G101900, whereas other colors represent multiple unique connections between genes in the associated network families.

Figure 4 .
Figure 4. Protein network Glyma.14G101900clustered in three groups.Glyma.14G101900 is referred to as I1M988 (red color) in the STRING database.Green lines represent all relationships to Glyma14G101900, whereas other colors represent multiple unique connections between genes in the associated network families.

Figure 5 .
Figure 5.Some views of Glyma.14G101900PDB model predicted by Phyre2 and illustrated by Chimera.(A) Ribbon view, (B) mesh surface view, (C) mesh surface with ball and stick view, and (D) hydrophobicity surface (hydrophobicity surface preset from dodger blue for the most hydrophilic to white to orange-red for the most hydrophobic).

Figure 5 .
Figure 5.Some views of Glyma.14G101900PDB model predicted by Phyre2 and illustrated by Chimera.(A) Ribbon view, (B) mesh surface view, (C) mesh surface with ball and stick view, and (D) hydrophobicity surface (hydrophobicity surface preset from dodger blue for the most hydrophilic to white to orange-red for the most hydrophobic).

Figure 6 .
Figure 6.Some views of Glyma.14G102100PDB model predicted by Phyre2 and illustrated by Chimera.(A) Ribbon view, (B) mesh surface with ball and stick view, (C) surface view, and (D) hydrophobicity surface (hydrophobicity surface preset from dodger blue for the most hydrophilic to white to orange-red for the most hydrophobic).

Figure 6 .
Figure 6.Some views of Glyma.14G102100PDB model predicted by Phyre2 and illustrated by Chimera.(A) Ribbon view, (B) mesh surface with ball and stick view, (C) surface view, and (D) hydrophobicity surface (hydrophobicity surface preset from dodger blue for the most hydrophilic to white to orangered for the most hydrophobic).

Table 1 .
Mean oil content (on a dry weight basis) and the standard error in parentheses of three FN lines from the field in Minnesota (2013) and Lexington, KY (2015), and the parental line, determined on a dry-weight basis via bulk-seed NIRS.n = 3 plots in one location.Data are displayed as mean ± SE.

Table 3 .
ProteinTable and oil content of bulked F2 seeds.The "Cross" column shows the parents in the original cross; the first parent shown is the maternal plant, and the second mentioned is the paternal.Oil and protein content was determined by bulk-seed NIRS using a Perten DA7200 Spectrometer with n = 3 technical replications on each seed batch.

Table 6 .
A large deletion event was detected on Chr14, from position 9,994,086 to 10,301,954 (coordinates based on the Williams 82 genome version 2 assembly (Wm82.a2.v1), a span of approximately 308 kb in the 1R22-derived populations.* Putative transcription factors.

Table 8 .
Review of putative function of 19 Arabidopsis homologues of the genes in the deleted region of the high-oil soybean mutant.